Structure-Based Chemical Shift Prediction Using Random Forests Non-Linear Regression
نویسندگان
چکیده
Protein nuclear magnetic resonance (NMR) chemical shifts are among the most accurately measurable spectroscopic parameters and are closely correlated to protein structure because of their dependence on the local electronic environment. The precise nature of this correlation remains largely unknown. Accurate prediction of chemical shifts from existing structures’ atomic co-ordinates will permit close study of this relationship. This paper presents a novel non-linear regression based approach to chemical shift prediction from protein structure. The regression model employed combines quantum, classical and empirical variables and provides statistically significant improved prediction accuracy over existing chemical shift predictors, across protein backbone atom types. The results presented here were obtained using the Random Forest regression algorithm on a protein entry data set derived from the RefDB re-referenced chemical shift database.
منابع مشابه
Prediction of Red Mud Bound-Soda Losses in Bayer Process Using Neural Networks
In the Bayer process, the reaction of silica in bauxite with caustic soda causes the loss of great amount of NaOH. In this research, the bound-soda losses in Bayer process solid residue (red mud) are predicted using intelligent techniques. This method, based on the application of regression and artificial neural networks (AAN), has been used to predict red mud bound-soda losses in Iran Alumina C...
متن کاملRegression Trees and Random forest based feature selection for malaria risk exposure prediction
This paper deals with prediction of anopheles number, the main vector of malaria risk, using environmental and climate variables. The variables selection is based on an automatic machine learning method using regression trees, and random forests combined with stratified two levels cross validation. The minimum threshold of variables importance is accessed using the quadratic distance of variabl...
متن کاملRandom forests for survival analysis using maximally selected rank statistics
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption is not always fulfilled. An alternative approach is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistics, which favors splitting variables with many possible sp...
متن کاملPrediction of toxicity of aliphatic carboxylic acids using adaptive neuro-fuzzy inference system
Toxicity of 38 aliphatic carboxylic acids was studied using non-linear quantitative structure-toxicityrelationship (QSTR) models. The adaptive neuro-fuzzy inference system (ANFIS) was used to construct thenonlinear QSTR models in all stages of study. Two ANFIS models were developed based upon differentsubsets of descriptors. The first one used log ow K and LUMO E as inputs and had good predicti...
متن کاملEvaluating Random Forests for Survival Analysis using Prediction Error Curves.
Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent err...
متن کامل